16 research outputs found

    Using R and Bioconductor for proteomics data analysis.

    Get PDF
    This review presents how R, the popular statistical environment and programming language, can be used in the frame of proteomics data analysis. A short introduction to R is given, with special emphasis on some of the features that make R and its add-on packages premium software for sound and reproducible data analysis. The reader is also advised on how to find relevant R software for proteomics. Several use cases are then presented, illustrating data input/output, quality control, quantitative proteomics and data analysis. Detailed code and additional links to extensive documentation are available in the freely available companion package RforProteomics. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan

    A draft map of the mouse pluripotent stem cell spatial proteome.

    Get PDF
    Knowledge of the subcellular distribution of proteins is vital for understanding cellular mechanisms. Capturing the subcellular proteome in a single experiment has proven challenging, with studies focusing on specific compartments or assigning proteins to subcellular niches with low resolution and/or accuracy. Here we introduce hyperLOPIT, a method that couples extensive fractionation, quantitative high-resolution accurate mass spectrometry with multivariate data analysis. We apply hyperLOPIT to a pluripotent stem cell population whose subcellular proteome has not been extensively studied. We provide localization data on over 5,000 proteins with unprecedented spatial resolution to reveal the organization of organelles, sub-organellar compartments, protein complexes, functional networks and steady-state dynamics of proteins and unexpected subcellular locations. The method paves the way for characterizing the impact of post-transcriptional and post-translational modification on protein location and studies involving proteome-level locational changes on cellular perturbation. An interactive open-source resource is presented that enables exploration of these data.The authors thank Andreas HĂĽhmer, Philip Remes, Jesse Canterbury and Graeme McAlister of Thermo Fisher Scientific, San Jose, CA, USA, for their advice regarding operation of the Orbitrap Fusion. We also thank Mike Deery for assistance with checking sample integrity on the mass spectrometers in the Cambridge Centre for Proteomics on equipment purchased via a Wellcome Trust grant (099135/Z/12/Z ), and Brian Hendrich of the Wellcome Trust-MRC Stem Cell Institute in Cambridge and Sean Munro of the MRC Laboratory of Molecular Biology in Cambridge for insightful comments about the data. AC was supported by BBSRC grant (BB/D526088/1). C.M.M. and L.G. were supported by European Union 7th Framework Program (PRIMEXS project, grant agreement number 262067), L.M.B was supported by a BBSRC Tools and Resources Development Fund (Award BB/K00137X/1), and P.C.H. was supported by an ERC Advanced Investigator grant to A.M.A. A.G. was funded through the Alexander S. Onassis Public Benefit Foundation, the Foundation for Education and European Culture (IPEP) and the Embiricos Trust Scholarship of Jesus College Cambridge. T.H. was supported by Commonwealth Split Site PhD Scholarship. T.N. was supported by an ERASMUS Placement scholarshipThis is the final version of the article. It was first available from NPG via http://dx.doi.org/10.1038/ncomms999

    Dynamic proteomic profiling of extra-embryonic endoderm differentiation in mouse embryonic stem cells

    Get PDF
    During mammalian pre-implantation development, the cells of the blastocyst’s inner cell mass differentiate into the epiblast and primitive endoderm lineages, which give rise to the fetus and extra-embryonic tissues, respectively. Extra-embryonic endoderm differentiation can be modeled in vitro by induced expression of GATA transcription factors in mouse embryonic stem cells. Here we use this GATA-inducible system to quantitatively monitor the dynamics of global proteomic changes during the early stages of this differentiation event and also investigate the fully differentiated phenotype, as represented by embryo-derived extra-embryonic endoderm (XEN) cells. Using mass spectrometry-based quantitative proteomic profiling with multivariate data analysis tools, we reproducibly quantified 2,336 proteins across three biological replicates and have identified clusters of proteins characterized by distinct, dynamic temporal abundance profiles. We first used this approach to highlight novel marker candidates of the pluripotent state and extra-embryonic endoderm differentiation. Through functional annotation enrichment analysis, we have shown that the downregulation of chromatin-modifying enzymes, the re-organization of membrane trafficking machinery and the breakdown of cell-cell adhesion are successive steps of the extra-embryonic differentiation process. Thus, applying a range of sophisticated clustering approaches to a time-resolved proteomic dataset has allowed the elucidation of complex biological processes which characterize stem cell differentiation and could establish a general paradigm for the investigation of these processes.This work was supported by the European Union 7th Framework Program (PRIME-XS project grant number 262067 to K.S.L., L.G and C.M.M), the Biotechnology and Biological Sciences Research Council (BBSRC grant number BB/L002817/1 to K.S.L and L.G.), as well as a HFSP grant (RGP0029/2010) and a European Research Council (ERC) Advanced Investigator grant to A.M.A.. C.S was supported by an EMBO long term fellowship and a Marie Curie IEF. L.T.Y.C. and K.K.N. were supported by the Medical Research Council (MRC, UK, MC_UP_1202/9) and the March of Dimes Foundation (FY11-436). We also thank Professor Steve Oliver and Dr. A.K.Hadjantonakis for helpful discussions and advice.This is the author accepted manuscript. The final version is available from Wiley via http://dx.doi.org/10.1002/stem.206

    A foundation for reliable spatial proteomics data analysis.

    Get PDF
    Quantitative mass-spectrometry-based spatial proteomics involves elaborate, expensive, and time-consuming experimental procedures, and considerable effort is invested in the generation of such data. Multiple research groups have described a variety of approaches for establishing high-quality proteome-wide datasets. However, data analysis is as critical as data production for reliable and insightful biological interpretation, and no consistent and robust solutions have been offered to the community so far. Here, we introduce the requirements for rigorous spatial proteomics data analysis, as well as the statistical machine learning methodologies needed to address them, including supervised and semi-supervised machine learning, clustering, and novelty detection. We present freely available software solutions that implement innovative state-of-the-art analysis pipelines and illustrate the use of these tools through several case studies involving multiple organisms, experimental designs, mass spectrometry platforms, and quantitation techniques. We also propose sound analysis strategies for identifying dynamic changes in subcellular localization by comparing and contrasting data describing different biological conditions. We conclude by discussing future needs and developments in spatial proteomics data analysis..G., C.M.M., and M.F. were supported by the European Union 7th Framework Program (PRIME-XS Project, Grant No. 262067). L.M.B. was supported by a BBSRC Tools and Resources Development Fund (Award No. BB/K00137X/1). T.B. was supported by the Proteomics French Infrastructure (ProFI, ANR-10-INBS-08). A.C. was supported by BBSRC Grant No. BB/D526088/1. A.J.G. was supported by BBSRC Grant No. BB/E024777/ and a generous gift from King Abdullah University for Science and Technology, Saudi Arabia. D.J.N.H. was supported by a BBSRC CASE studentship (BB/I016147/1)

    Spatiotemporal proteomic profiling of the pro-inflammatory response to lipopolysaccharide in the THP-1 human leukaemia cell line.

    Get PDF
    Protein localisation and translocation between intracellular compartments underlie almost all physiological processes. The hyperLOPIT proteomics platform combines mass spectrometry with state-of-the-art machine learning to map the subcellular location of thousands of proteins simultaneously. We combine global proteome analysis with hyperLOPIT in a fully Bayesian framework to elucidate spatiotemporal proteomic changes during a lipopolysaccharide (LPS)-induced inflammatory response. We report a highly dynamic proteome in terms of both protein abundance and subcellular localisation, with alterations in the interferon response, endo-lysosomal system, plasma membrane reorganisation and cell migration. Proteins not previously associated with an LPS response were found to relocalise upon stimulation, the functional consequences of which are still unclear. By quantifying proteome-wide uncertainty through Bayesian modelling, a necessary role for protein relocalisation and the importance of taking a holistic overview of the LPS-driven immune response has been revealed. The data are showcased as an interactive application freely available for the scientific community

    Learning from Heterogeneous Data Sources: An Application in Spatial Proteomics.

    Get PDF
    Sub-cellular localisation of proteins is an essential post-translational regulatory mechanism that can be assayed using high-throughput mass spectrometry (MS). These MS-based spatial proteomics experiments enable us to pinpoint the sub-cellular distribution of thousands of proteins in a specific system under controlled conditions. Recent advances in high-throughput MS methods have yielded a plethora of experimental spatial proteomics data for the cell biology community. Yet, there are many third-party data sources, such as immunofluorescence microscopy or protein annotations and sequences, which represent a rich and vast source of complementary information. We present a unique transfer learning classification framework that utilises a nearest-neighbour or support vector machine system, to integrate heterogeneous data sources to considerably improve on the quantity and quality of sub-cellular protein assignment. We demonstrate the utility of our algorithms through evaluation of five experimental datasets, from four different species in conjunction with four different auxiliary data sources to classify proteins to tens of sub-cellular compartments with high generalisation accuracy. We further apply the method to an experiment on pluripotent mouse embryonic stem cells to classify a set of previously unknown proteins, and validate our findings against a recent high resolution map of the mouse stem cell proteome. The methodology is distributed as part of the open-source Bioconductor pRoloc suite for spatial proteomics data analysis.LMB was supported by a BBSRC Tools and Resources Development Fund (Award BB/K00137X/1) and a Wellcome Trust Technology Development Grant (108441/Z/15/Z). LG was supported by the European Union 7th Framework Program (PRIME-XS project, grant agreement number 262067) and a BBSRC Strategic Longer and Larger Award (Award BB/L002817/1). DW and OK acknowledge funding from the European Union (PRIME-XS, GA 262067) and Deutsche Forschungsgemeinschaft (KO-2313/6-1).This is the final version of the article. It first appeared from PLOS via https://doi.org/10.1371/journal.pcbi.100492

    Additional Precursor Purification in Isobaric Mass Tagging Experiments by Traveling Wave Ion Mobility Separation (TWIMS)

    No full text
    Despite the increasing popularity of data-independent acquisition workflows, data-dependent acquisition (DDA) is still the prevalent method of LC–MS-based proteomics. DDA is the basis of isobaric mass tagging technique, a powerful MS2 quantification strategy that allows coanalysis of up to 10 proteomics samples. A well-documented limitation of DDA, however, is precursor coselection, whereby a target peptide is coisolated with other ions for fragmentation. Here, we investigated if additional peptide purification by traveling wave ion mobility separation (TWIMS) can reduce precursor contamination using a mixture of <i>Saccharomyces cerevisiae</i> and HeLa proteomes. In accordance with previous reports on FAIMS-Orbitrap instruments, we find that TWIMS provides a remarkable improvement (on average 2.85 times) in the signal-to-noise ratio for sequence ions. We also report that TWIMS reduces reporter ions contamination by around one-third (to 14–15% contamination) and even further (to 6–9%) when combined with a narrowed quadrupole isolation window. We discuss challenges associated with applying TWIMS purification to isobaric mass tagging experiments, including correlation between ion <i>m</i>/<i>z</i> and drift time, which means that coselected peptides are expected to have similar mobility. We also demonstrate that labeling results in peptides having more uniform <i>m</i>/<i>z</i> and drift time distributions than observed for unlabeled peptides. Data are available via ProteomeXchange with identifier PXD001047

    Principal components analysis plot (PCA) of the mouse stem cell dataset.

    No full text
    <p>Proteins are clustered according to their density gradient distributions. Each point on the PCA plot represents one protein. Filled circles are the original protein markers used in classification, hollow circles show new locations as assigned by the SVM TL classifier. The 4 proteins GTR3_MOUSE, SNTB2_MOUSE, PAR6B_MOUSE and ADA17_MOUSE that were found in the SVM TL method and not in an SVM classification with LOPIT only are highlighted.</p
    corecore